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This  paper  considers  the  application  of  system 
identification  techniques  using  spectral  representation 
for  fitting  models  to  textures  and  images  and  consists 
of  two  parts.  In  part  I,  we  develop  consistent  deci- 
sion rules  for  choosing  the  neighborhood  in  a one- 
dimensional autoregressive  (AR)  model.  In  part  II, 
the  theory  is  extended  to  the  case  of  stationary  two- 
dimensional  random  fields. 
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Part  I 
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Optimal  Choice  of  Neighbors  ' ‘ ' 

for  One-Dimensional  Autoregressive  Models  j . w 


1.  Introduction  | | 

We  are  interested  in  fitting  an  adequate  model  to  a one- 
dimensional observation  sequence  Y^  = {y  (1) , . . . ,y  (N)  } obtained 
from  a stationary  stochastic  process.  For  instance,  Y^  could 
arise  from  sampling  image  gray  levels  at  equal  intervals  along 
a line.  YN  will  be  regarded  as  the  output  of  a one-dimensional 
autoregressive  (AR)  process.  Many  decision  rules  are  available 
in  the  literature  [1,2,3]  for  fitting  a unidirectional  auto- 
regressive (UAR)  model  where  the  current  observation  depends 
only  on  past  ones.  The  problem  of  fitting  a bilateral  auto- 
regressive (BAR)  model  [4],  where  the  current  observation  de- 
pends on  the  neighbors  on  either  side,  has  not  been  given  much 
attention.  Such  models  appear  to  be  appropriate  when  the  pro- 
cess is  obtained  by  sampling  an  image,  since  for  images  there 
is  no  essential  difference  between  the  neighbors  on  one  side 
and  those  on  the  other.  In  this  part  we  consider  the  problem 
of  finding  the  optimal  neighborhood  size  in  a one-dimensional 
AR  model  for  a given  empirical  series. 

Two  approaches  to  modeling  are  the  maximum  likelihood 
approach  and  the  Bayesian  approach.  We  take  a Bayesian  approach 
in  this  paper  due  to  the  following  reasons:  (i)  We  obtain  con- 
sistent decision  rules  for  choosing  the  best  model,  and  (ii)  an 


explicit  expression  for  the  probability  density  of  observations 
given  a model  is  obtained,  which  will  be  useful  for  classifica- 
tion of  images  and  textures. 

A comprehensive  theory  is  available  for  fitting  UAR  models 
to  the  given  data.  In  the  maximum  likelihood  approach  one 
maximizes  the  likelihood  function  separately  for  each  model. 

The  best  model  is  then  chosen  by  using  the  decision  statistic 
suggested  by  Akaike  [1]. 

In  the  Bayesian  approach  [2]  of  fitting  models  to  the  data, 
various  plausible  models  (UAR  of  order  2,  order  3,  etc.)  are 
postulated  as  mutually  exclusive  hypotheses  C^,  1 s i £ r, 
where  r is  the  total  number  of  models  under  consideration. 

The  model  which  maximizes  the  posterior  probability  density 
p (Ci  |yn)  is  chosen  as  the  best  model  with  minimum  probability 
of  error.  The  Bayesian  approach  involves  obtaining  an  expres- 
sion for  p (Y  1 0 , C,  ),  1 5 k < r,  where  0 is  the  parameter 
vector  characterizing  the  model,  and  then  integrating  this  over 
an  appropriate  prior  probability  density  function  p ( 0 | C.  ) . 

A Gaussian  assumption  is  usually  made  about  the  noise  driving 
the  model  in  order  to  obtain  a simple  expression  for 
p(YN|^,C^).  A comprehensive  theory  for  comparison  of  models 
more  general  than  autoregressive  has  been  developed  in  [2] , 
and  the  case  of  independent  observations  and  linear  models  has 
been  considered  in  [5]. 
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In  this  paper  we  suggest  a Bayesian  approach  for  BAR 


model  fitting.  As  time  domain  analysis  is  quite  complicated 


for  bilateral  models  we  resort  to  spectral  domain  analysis, 


Hence  instead  of  maximizing  PCC^JY^,  we  maximize  P(C^|ZN) 
to  obtain  the  minimum  probability  of  error  decision  rule. 


Since  the  finite  Fourier  transform  is  a nonsingular  transfor- 


mation with  unity  Jacobian,  the  decision  rules  maximizing 


P (Ci | Y^)  and  P(C.|ZN)  are  equivalent.  We  first  write  an 
expression  for  p(Z„|0,C.)  using  the  asymptotic  Gaussian 
properties  of  finite  Fourier  transforms  and  integrate  it  w.r.t, 


0,  by  using  an  appropriate  prior  probability  density  function 


p(6|Ci).  Using  the  expression  for  P(zNlCi)'  a decision  rule 


that  chooses  a correct  model  with  minimum  probability  of  error 


is  designed.  Any  Bayesian  methodology  should  answer  criticisms 


against  the  assumption  of  prior  densities.  In  this  paper, 


we  derive  p ( ZN | Ci ) for  any  arbitrary  prior  densities  by  using 


a theorem  from  the  asymptotic  theory  of  integration  [10]. 


We  show  that  the  decision  statistic  suggested  here  reduces 


to  the  results  reported  in  the  literature  for  UAR  models  [2]. 


We  also  establish  the  consistency  of  the  decision  rule,  i.e.. 


the  probability  of  choosing  the  jth  model  when  the  ith  model 


is  true  goes  to  zero  uniformly  as  N -*■  ». 


The  organization  of  the  paper  is  as  follows:  In  Section  2, 


we  derive  expressions  for  p ( Y^  | ^6 ) for  first  order  UAR  and 


BAR  models  to  show  the  relative  complexities  of  the  expressions. 
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The  problem  of  fitting  BAR  models  has  not  been  given  much 
attention  since  Whittle's  work  [4].  Whittle  has  shown  how  to 
construct  UAR  models  that  have  the  same  autocorrelation  as 
given  BAR  models,  so  that  known  procedures  for  UAR  model  fit- 
ting could  be  applied.  But  it  has  been  pointed  out  that  it  is 
the  multilateral  scheme  in  general  that  corresponds  to  reality 
even  in  those  cases  for  which  the  formal  work  of  estimation, 
etc.,  is  more  simply  performed  using  an  equivalent  UAR  model. 

For  bilateral  models  the  expression  for  the  likelihood 
of  observations  is  a complicated  function  of  the  coefficients, 
since  the  Jacobian  of  the  transformation  from  the  noise  variates 
to  observations  is  not  unity.  [In  Section  2,  we  derive  expres- 
sions for  p (Y„| G)  for  one-dimensional  UAR  and  BAR  models  to 
N ~ 

illustrate  the  complexity  of  the  expressions.]  By  considering 
the  likelihood  of  transforms  of  observations,  one  obtains  a 
simple  form  for  the  likelihood  function.  Specifically,  using 
the  asymptotic  Gaussian  properties  of  the  finite  Fourier  trans- 
form, ZN  = (z  (\1),z(X2)  , . . . ,z  (XN)  ) , an  explicit  expression  can 
be  written  for  p ( 2^ | ) , the  dependence  on  the  parameters 
appearing  through  the  spectral  density  function  of  the  process, 
evaluated  at  discrete  frequencies  *2' * * * 'Nj  Using 

numerical  optimization  algorithms  [8],  the  maximum  likelihood 
is  evaluated,  and  using  Akaike's  criterion  the  best  model  is 
chosen.  This  procedure  has  been  recently  considered  for  a 
vector  random  field  [9]. 


In  Section  3 we  design  a decision  rule  that  chooses  a BAR  model 
with  minimum  probability  of  error.  Section  4 establishes  the 
consistency  of  the  decision  rule.  The  properties  of  the  deci- 
sion rule  are  discussed  in  Section  5,  and  the  possible  applica- 
tions are  indicated  in  Section  6. 


In  this  section  we  derive  the  expressions  for 
[11],  when  Yn  is  assumed  to  obey  a UAR  model  or  a BAR  model, 
in  order  to  compare  the  relative  complexities  of  these  expres- 
sions. For  simplicity,  we  consider  first-order  models. 

2.1  UAR  model 

Given  Y^  = (y  (1) ,y  (2) ,... ,y  (N) ),  consider  a first  order  UAR 

model 

y(t)  = 4>1y(t-l)  + oo(t),  1 < t s N (2.1) 

where  a>(t),  t = 1,2,...,N  are  identically  and  independently 
distributed  Gaussian  noise,  N(0,p).  Consider  the  transforma- 
tion of  random  variables  from  «(1)  ,co(2)  , . . . ,co(N)  to  y(l),y(2), 

. . . ,y  (N) : 


The  transformation  in  (2.2)  is  not  exact  since  we  have  not 
considered  the  initial  conditions.  For  large  values  of  N, 
the  disturbance  due  to  initial  conditions  is  negligible.  From 


(2.7) 


p (YN  I $1'  p)  = (1/2np)N/2exp[-(N/2p) 

N 2 - 2 

(p+(1/n)  z y (t-1)  (({)-,-<})-,) 
t=l 

p (Yn  | 4>i » p)  is  a quadratic  form  in  ^ and  its  further  analytical 
manipulation  is  easy. 

2. 2 BAR  model 

Given  YN  = (y  (1) , . . . ,y (N) ) , consider  the  BAR  model 

y(t)  = ♦jyft-l)  + <t>2y  (t+1)  + cod),  1 5 t s N (2.8) 

where  (co(*)}  is  as  in  Section  2.1. 

Consider  the  transformation  of  random  variables  from 
co(l)  ,co(2)  , . . . ,co(N)  to  y (1)  ,y  (2)  , . . . ,y  (N)  i 


- 

— — 

1 -4>2  0 

yd) 

cod) 

-<^1  1 -<f>2  0 

• 

• 

0 1 -d>2 

• 

• 

1 

• 

• 

1 

• • • 

y (n) 

co  (N) 

_ 

L J 

[f2]  [y  (1)  ,y(2) , . . . ,y  (N)  ]T  = [co(l) «(N)]  (2.9) 

From  the  law  of  transformation  of  random  variables. 


p(y  (1)  , . . . ,y  (N)  | p) 

= | j|p(co(l)  ,co(2)  , — , u)(N) ) 

(co(l),...,co(N))T  = [*  ] (y  (1)  , . . . ,y  (N)  ) T 


the  law  of  transformation  of  random  variables. 


I * 
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p(y  (1) , . . . ,y  (N)|  <j>1#  p) 


| J |p  (co(l)  , • . • , co(N)  ) 


(2.3) 


(co (1) , . . . , co  (n)  ) 1 = [y  (y(l),...,y(N)) 


The  Jacobian  of  the  transformation  is  unity.  By  using  the 
Gaussian  assumption  regarding  co (1)  , co (2)  , . . . , co(N)  , we  obtain 


P (y  (1)  »y  (2) , . . . ,y  (N)  1 4>1  r p) 


N 


(2.4) 


= (l/2Ttp)N//2exp(-  — 2 (y(t)  — 4>1y(  t— 1 ) ) 2 ) 

zp  t=l  1 


Let 


N 


N 


<J).  = 2 y (t)y  (t-1)/  2 y (t-1) , and 

t=l  t=l 


N 


(2.5) 


p = ( 1/N)  2 (y  (tj-if.y  (t-1)  ) 
t=l 


The  exponential  term  in  (2.4)  can  be  rewritten  as 


N 


£ (y  (t)-<J>1y  (t-l)+(j>1y  (t-l)-<J>1y  (t-1) ) 


N 


(2.6) 


= N(p+(1/N)  (*,-♦, )2  2 y2 (t-1) ) 

1 t=l 


Substituting  (2.6)  in  (2.4)  we  have 


__ 


3.  Decision  rule  for  BAR  model  selection 


- . i 


We  are  given  a sequence  of  observations  YN  = (y (1) , . . . ,y (N) ) 
and  r mutual  exclusive  compound  hypotheses  C1 ,C2 , . . . ,Cr . To 
describe  C consider  the  stochastic  difference  equation  ( 4> , p ) 


: [Ai($,D)  + Bi(^,D_±)  ]y(t)=  u>(t) 

m. 

Ai(t,D)  = l+^D  + <))2D2+..  .4>m  D X 

B.  (<J) , D)  = <j>  D-1+4  D_2+...  * D_(ni"mi) 

l ~ mi+l  mi+2  mi 


(3.1) 


where 


Dry(s)  = y (r+s) 


(3.2) 


iji  — /i  j 

(3.1)  is  characterized  by  an  (n.  +1)  dimensional  vector  0 v®  ,p  ' 

ni  <T> 

<j>£R  , 4>  = (<!>..  ,<f>5, ...  ,<J>  ) , j = 1,2, ...,n.  and  T denotes 

**  i J r 

the  transpose  operator.  In  (3.1)  ui(t),  t=l,2,...,N  are  inde- 
pendent and  identically  distributed  Gaussian  random  variables 
v/ith  zero  mean  and  variance  p.  When  the  coefficients  in  the 
expression  B^(<|>,D)  are  all  identically  zero  the  model  reduces 
to  a unilateral  autoregressive  model. 

We  make  the  following  assumption. 

Al)  : The  zeros  of  (Ai(4>,D)  + B^(<)>,D  1)  ) do  not  lie  on  the  unit 
circle,  for  all  i,  l^i^r. 

Let  C = {E  (<|> , p)  ; pX) , 4>£Rn} 

A (E,m,n) 

C is  a class  standing  for  a set  of  models  all  having  the  same 
equation  E with  the  same  m,n  but  differing  from  each  other  in 
the  numerical  values  of  the  coefficients.  As  long  as  their 
equations  are  different,  the  two  classes  are  different. 


___ 


From  the  structure  of  the  equation  in  (2.9)  we  see  that 
the  Jacobian  of  the  transformation  is  not  unity,  but  rather  a 


complicated  function  of  coefficients.  Also  note  that  we  do 
not  even  obtain  a closed  form  solution  for  the  maximum  likeli- 
hood estimate  (^1,^2)Tof  (<t>1 , 4>2) T,  as  in  the  case  of  UAR 
models . 
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We  are  interested  in  finding  a decision  rule  to  assign 
Yn  to  one  of  the  hypotheses  C^,  l^i^r.  It  is  well  known  that 
the  decision  rule  that  chooses  the  true  model  with  minimum 
probability  of  error  is: 

Choose  hypothesis  k*  if 

k*  = arg  max  {P(Ci|YN)}  ^ 

k 

Consider  a nonsingular  transformation  with  Jacobian  unity, 

T:  Yn-»-Zn.  Then  the  decision  rule  in  (3.4)  is  equivalent  to  (3.3): 
Choose  hypothesis  k*  if 

k*  = arg  max  {P(C.|z  )}  (3.4) 

k 1 N 

Here 

P<cilV  - p(ZN|C1)P(Ci) 

— PTV 

where  P(C^),  l^i^r  are  the  prior  probabilities  of  hypotheses  and 

p(zN|Ci)  = / p(zN|0,ci)  p(0|ci)de 

Specifically,  consider  the  case  where  T is  the  finite  Fourier 
transform.  This  enables  use  to  write  an  expression  for 
p(ZN|0,C^)  by  using  Theorem  1 [6] [7] : 

Theorem  1 : Consider  the  finite  Fourier  transform  of  the  observ- 
ations defined  by 

N 

z(A  ) = N_1/.2  Z e'jXity(t)  (3.5) 

t=l 

where  j = /-T,  A^=2iri/N,  i=l,2,...,N.  Let  the  observations  obey 
the  hypothesis  Ck*  For  large  values  of  N,  the  finite  Fourier 


transforms  z(X^),  z (X2)  , . . . , z (XN)  are  independent  and  distri- 
buted normally  with  zero  means  and  variances 


S (e-*  Xi , <)> , p)  , S (eJ  A2 , <j> , p ) S (eJ/vrg(J),p) 


>3X2 


3XN, 


where 


S (ejXi,<t>,P)  = p H.  (ejXi,<|>)  H*(^Xi,<|>) 


and 


Hk(ejXi,$)  = [Ak($,ejXi) 


3xi\  1 _1 


+ Bk ( ^ , e J 1 ) ] 


ls.ksr 

lii^N 


(3.6) 


(3.7) 


This  theorem  allows  us  to  write  an  expression  for  the 
probability  density  of  the  transforms  of  the  observations  given 
the  parameters  of  the  model  and  the  hypothesis  it  obeys.  The 
likelihood  of  the  transformed  observations  is  given  by 


In  P ( z(  A ^ ) , z ( X 2 ) f * z ( x ) | 41 » P / Ck ) 

„!c„2„  - jjjin 


j X- 

S„  (e  1,4>»p) 


N Dxi  1 

+ Z z(X  )z*(X  )/Sw  (e  \4>,P)> 

i=l  1 1 Yk  ~ J 

Substituting  (3.6)  in  (3.8)  we  have 
LHS  of  (3.8) 

m , N j X . ~ 

= - |£n  2tt  - i-  Z £n(p||Hk(e  ) | | ) 

N o jX.  7 

- (l/2p)  l (|  | Z ( X . ) | | VI  |H.  (e  | n 

i=l  1 K 


(3.8) 


(3  .9) 


y^2rrp  - i L In  | | H,  (e  i,4>)  | |2 
2 2 i=l  jX 

(1/2  p)  I (|  I z (X  ) | 1 2/  | |H  (e  \<J>)  | |2 
i=l  1 K 


(3.10) 


The  structure  of  (3.10)  is  not  in  an  appropriate  form  for 
further  manipulation.  We  give  below  an  equivalent  expression 
for  (3.10)  in  Theorem  2.  Prior  to  that  we  need  the  following 
assumption: 

ASSUMPTION  (A2) : The  first  and  second  derivatives  of 

N jX.  - 

E£n||H.(e  \<J>)  | |2 

i=l  K 


N 9 3^ 

and  1 | | z (A  ) | | VI  |Hk(e  1,<J>)  | | 
i=l  1 K 


w.r.t.  <f> 


exist  for  all  <t>  t R 


Theorem  2:  For  large  values  of  N, 

In  p(z(X1)rz(X2)#...  fc(AN)|  <J>,p,Ck) 


= - (N/2) £n  f($k,Pk)  - f((P-Pk)dk  + 


($"$k)T9k(^"$k)  + (p_pk)  ($"$k)Tsk  + 0(1 1 1 ,3)  <3.11) 


where 


f(*k'pk> 


1 N o 

= £npk  + I .£  |Hk(e  '*kHI+l 


(3.12) 


dk  = 92f($,P)/3P2 


1$  - $k 

P = P,. 


<j>k  and  pk  are  maximum  likelihood  estimates  of  <J>  and  p , 


K = I 3 f (4>»P> 
=k  2 9 4>i9(t)j 


$=*k 


(n^xn^matrix)  (3.14) 
lii&nk,  l^jsnk 


Sk  = 9^f  ($fp)/3p8^i 


(nkxl)  vector  (3.15) 


N jX.  - 

f(4>,p)  = £np+ (1/N)  l £n||H  (e  1 A)  \ \ 

i=l 

+ (1/Np)  £ l | z (X  ) ( | 2/  | |H  (eD  1 ,4>)|  | • 

i=l  1 

To  obtain  p (z (X^) ,z ( X ) , . . . z ( XN) | Ck)  we  must  integrate 
(3.11)  over  ( 4> , p ) by  using  appropriate  prior  probability  den- 
sities p(4>,p|Ck).  We  do  not  make  any  specific  assumption  re- 
garding the  structure  of  p ( 4> , p | ck)  • The  density  must  be  regular 
but  otherwise  can  be  arbitrary.  We  use  a theorem  from  asymptotic 
integration  [9]  to  integrate  over  (J>  and  p.  An  approximate  ex- 
pression for  the  posterior  density  P(C^|ZN)  is  given  in 
Theorem  3 ; For  large  values  of  N, 

£nP(Ci|ZN)=  -(N/2)f  (<Fk,Pk)  + £np($k,pk/Ck) 

+ j(nk+l)  In  (2tt/n)  + (N/2)£n2Tr  - i-£n  det  Fn  (-g(0;N)) 

k 


+ £nP(Ci)  - £nP(ZN) 


(3.16) 


where 


f ($k'pk)  = £npk  + Sf  •E1£n|  >Hk(e  i'^k) I |2+1 


(3.17) 


w 


For  practical  applications  we  suggest  a simplified  decision 


rule: 

Decide  hypothesis  is  k*  if 

k*  = arg  min{h^(ZN)} 
k 

Where 

N jX.  _ - 

h^(ZN)  = N£npR  + E^n|  |Hk(e  x,$k)  | |Z  + n^n  N 


(3.20) 


The  consistency  of  this  decision  rule  can  be  proved  similarly 
to  that  of  (3.17). 

The  form  of  the  decision  statistics  is  similar  to  that 
reported  in  the  literature  [2] [5].  The  first  two  terms  repre- 
sent the  contribution  from  the  likelihood  of  transforms  of  the 
observations  and  the  second  term  is  due  to  the  prior  probability 
density  function.  We  show  that  the  decision  statistic  reduces 
to  that  reported  in  [2]  for  UAR  models.  For  these  models  the 
first  simplification  is  that  the  Jacobian  | Jk | of  the  transfor- 
mation from  noise  variates  to  observations  is  unity.  Hence  [13] , 
for  the  kth  model 


n jX  * jX 

In  Jk  = — - — -/*n||H.(e  ,<|>)H  (e  ,<J>)||dX 

k 2 (2tt)  n K ' K 

, N jX.  , 

= £ £n|  |Hk(e  11=0 

1=1  K 


(3.21) 


' _ 1 

Also  the  coefficients  in  the  expression  B^($,D  ) are  identi- 

cally zero.  The  equations  for  <fck  and  p”k  reduce  to 


and 


g (9;N)  = -[(P-Pk)2dk  + ^-$k)TQk^-$k) 

+ (p-pk) (*~*k)Tfk] 

II 

Comments:  (1)  we  have  obtained  an  approximate  expression  for 

I 1 

^nP(C^|ZN).  Hence  the  decision  rule  that  maximizes  P(C^|zN) 
obtained  here  does  not  exactly  minimize  the  probability  of 
error. 

(2)  The  expression  suggested  in  (3.16)  involves  a term  due 
to  the  prior  probability  density.  The  prior  probabilities 

1 I 

should  be  chosen  to  reflect  the  degree  of  knowledge  we  possess 
about  the  parameters.  Following  Jeffrey,  we  suggest  a uniform 
distribution  for  each  of  the  components  of  <J>  and  a uniform 
distribution  for  Inp  [12]. 

We  suggest  a decision  rule  that  approximates  the  minimum 

probability  of  error  rule: 

Decide  hypothesis  is  k*  if 

k*  = arg  max  (h.  (ZM)  } 
k K N 

where  hk(ZN)  = - (N/2) f (Jk# pk)  - (nk/2)£n  N 

+£n  p(ik,pklCk)  " det  Fk(-9<0'N>>  (3.19) 

We  establish  the  consistency  of  the  decision  rule  in  the  next 
section. 


<J>.  = min{  E 
k i=l 


N | i Z (X.)  | | 

r A 


|»k(e 


JT 

\*k:||2 


} 


l N o jX, 

and  Pk  = £ l \ |Z(^)  I I /I  |Hk(e  \$k)  | | 


The  statistic  f ( , p^)  reduces  to  f(<f>k,pk) 


£npk+l 


and  the  decision  statistic  is 


hk  (ZN)  = N(-dnpk)  + nR£n  N 


(3.22) 


(3.23) 


(3.24) 

which  is  the  statistic  reported  in  [3]  for  UAR  model  fitting. 

In  the  next  section  we  prove  the  consistency  of  the  deci- 
sion rule. 


4 . Consistency  of  the  decision  rule 


Definition;  Let  P^Z^C^  denote  the  probability  that  we 
choose  the  model  when  the  true  model  is  Ci . Let  the  obser- 
vations obey  the  model  C^.  The  decision  rule  is  said  to  be 
consistent  if  P ^ (ZN  | Ci)  -*-0  uniformly  as  for  all  jj*i. 

For  simplicity  we  assume  that  there  are  two  hypotheses  C^ 
and  C2.  Let  the  observations  obey  the  hypothesis  Then  C2 

could  belong  to  one  of  the  following  two  cases; 

Case  (i) : (Over-specified  hypothesis) 

nk 

C,  is  overspecified  w.r.t  C-^  if  there  exists  a 4>'CR  such  that 

A2(4>\D)  = Ax  (4>  ,D)  and  B2(4>',D)  = B^^D) 

T 

for  <J)  = ( ‘t’l r ^2 ' * * ' ' ^n  ^ ^ ® / 3-1/2,  . . .,n^ 

Example ; Consider  the  hypotheses  C^,C2,  and  C^  defined  by  the 
equations  E^,  E2,  and  E^: 

E±:  (1  + + 4> 2d-  1 + 4> 3D_ 2 ) y (t)  = w(t) 

E2:  (1  + + <p2D2  + <t>3D_1  + 4>4D-2)y(t)  = u)(t) 

E3;  (1  + ^D'1  + <t>2D_2  + <f>3D-3)  y ( t)  = u)(t) 

C2  is  overspecified  w.r.t  C^  but  C3  is  not  overspecified  w.r.t 

Hr 

Case  (ii);  All  other  models  not  covered  by  Case  (i) 

We  state  and  prove  a theorem  which  establishes  the  consis- 
tency of  the  decision  rule. 


5.  Discussion 


The  decision  rule  developed  here  completely  solves  the 
problem  of  the  choice  of  neighbors  for  one-dimensional  AR  models 
for  a given  empirical  series.  The  decision  rules  developed  in 
[1],[2][3]  have  covered  only  the  UAR  models.  Only  Whittle  [4] 
has  considered  the  problem  of  BAR  models  in  connection  with  a 
line  transect,  but  no  proof  is  given  for  the  consistency  of  his 
decision  rule. 

The  hypotheses  C^,  l^i^r  defined  here  include  both  unilateral 
and  bilateral  AR  models.  The  decision  rule  is  consistent, 
transitive,  and  yields  a quantitative  explanation  for  the  prin- 
ciple of  parsimony  used  in  model  building.  The  asymptotic  ana- 
lysis given  here  holds  for  large  values  of  N,  about  100-200. 

The  Bayesian  approach  has  two  advantages:  (1)  It  yields  consistent 
decision  rules  for  choosing  the  correct  model;  (2)  the  analysis 
yields  an  explicit  expression  for  the  probability  density  of 
transforms  of  observations  given  the  model  that  the  observations 
obey.  This  expression  could  be  used  for  classification  purposes. 


I ! 

M 

' 

6.  Applications  and  extensions 

Assume  that  we  are  given  a sequence  of  sampled  gray  levels 
along  a row  of  an  image.  We  can  consider  UAR  models  of  orders 
one  and  two  and  BAR  models  of  orders  one  and  two  as  a set  of 
plausible  models  and  use  the  theory  developed  here  to  choose  the 
best  model  with  minimum  probability  of  error. 

As  the  Bayesian  approach  yields  an  explicit  expression  for 

. 

the  probability  density  of  observations  given  a model,  better 
rules  can  be  developed  for  classification  purposes. 

The  theory  can  be  easily  extended  to  cover  bilateral  auto- 
repressive  and  moving  average  (BARMA)  models  by  appropriately 
modifying  the  structure  of  the  transfer  function  and  the  assoc- 
iated stability  conditions. 

The  extension  of  the  theory  described  here  to  stationary 
random  fields  is  considered  in  Part  II. 


j 
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Part  II 


Statistical  Inference  Theory 
Applied  to  Texture  Representation 


1.  Introduction 

We  are  interested  in  developing  statistical  models  for 
textures.  In  particular,  we  are  interested  in  applying  the 
theory  of  statistical  inference  of  stationary  random  fields  to 
images.  We  assume  that  the  textures  under  consideration  are 
sample  functions  of  stationary  random  fields,  not  necessarily 
isotropic.  The  organization  of  the  paper  is  as  follows:  In 
the  rest  of  this  section  we  review  earlier  research  done  in 
image  modeling  and  motivate  the  inference  approach.  In  Section 
2 we  formulate  the  problem  and  develop  a decision  rule  for 
inferring  the  correct  model  with  minimum  probability  of  error. 
We  also  assert  the  consistency  of  the  decision  rule.  Section  3 
compares  the  theory  developed  here  with  other  known  approaches 
in  the  literature.  Applications  and  possible  extensions  are 
indicated  in  Section  4. 


1 . 1 Types  of  models 

Early  attempts  at  image  modeling  [14]  , [15]  applied  one- 
dimensional time  series  analysis  to  two-dimensional  images . 

By  concatenation  of  successive  rows,  a one-dimensional  time 
series  is  generated.  Seasonal  autoregressive  integrated  moving 
average  models  [14]  and  seasonal  autoregressive  models  [15]  have 
been  fitted  to  this  time  series. 

It  is  intuitively  clear  that  any  model  should  reflect  the 
two-dimensional  nature  of  the  image.  Tou  et  al.  [16]  considered 
two-dimensional  autoregressive  and  moving  average  models  for 
textures.  By  differencing  along  rows  and/or  columns  nonstation- 
arity  in  the  series  is  removed.  By  inspection  of  autocorrelation 
functions  tentative  models  are  determined  and  the  parameters  are 
estimated  by  least  square  methods.  In  a subsequent  paper  [17] 
classification  rules  are  given  based  on  differences  between  the 
gray  levels  of  the  original  texture  and  regenerated  texture. 

Recently,  Kashyap  [18]  has  suggested  a two-dimensional 
unilateral  autoregressive  model  for  images.  A vector  of  suffi- 
cient statistics  has  been  derived  for  the  parameters  of  the 
model  which  by  definition  possess  all  the  information  in  the 
samples.  Consistent  rules  have  been  developed  to  determine  the 
width  of  the  one-sided  neighborhood. 

It  is  not  clear  if  these  unilateral  models,  though  they  are 
two-dimensional,  are  appropriate  for  images,  since  for  images 


the  neighborhood  dependence  extends  in  all  directions.  Though 
Whittle  [4]  has  considered  equivalent  unilateral  schemes  for 
a given  bilateral  scheme , there  are  instances  where  there  exists 
no  equivalent  f inice  unilateral  autoregressive  model  (UAR)  for 
a given  bilateral  autoregressive  model  [4]. 

Pratt  [19] [20]  and  Gagalowicz  [21]  have  suggested  two- 
dimensional  spatial  filter  models  which  transform  a sequence  of 
white  noise  variables  into  the  observed  structure.  No  attempts 
have  been  made  to  use  inference  methods  for  identifying  the 
models . 

Stochastic  partial  differential  equations  [22]  have  also  been 
suggested  as  models  for  images.  The  discretized  equivalents  of 
partial  differential  equations  have  been  fitted  using  least 
square  techniques.  These  models  cover  nonseparable,  nearly  iso- 
tropic images.  No  attempts  have  been  made  to  infer  the  models 
from  the  data. 

Two-dimensional  linear  estimation  techniques  have  been  con- 
sidered for  textures  [23].  The  gray  level  at  the  element  (i,j) 
is  assumed  to  depend  on  neighbors  within  a window  surrounding  the 
(i,j)  element.  However,  the  dimensions  of  the  window  are  deter- 
mined by  using  Akaike's  statistics,  which  apply  to  a singly 
indexed  sequence.  A mean  square  criterion  is  used  for  determin- 
ing the  coefficients  of  the  model. 

By  operating  on  an  array  of  independent 


and  identically  distributed  random  variables,  using  a set  of 
independent  parameters  which  control  the  directionality  and 
graininess  of  the  random  field  generated,  textures  are  synthe- 
sized in  [24]  [25] . Results  are  given  for  binary  first  order 
isotropic  Markov  random  fields  [24] . The  number  of  parameters  is 
proportional  to  the  square  of  the  number  of  gray  levels  in  the 
texture  and  may  be  large  for  real  textures. 

There  are  many  shortcomings  in  the  models  that  have  been 
discussed  above.  Some  of  the  models  view  an  image  as  a conca- 
tenation of  rows,  which  is  clearly  inadequate.  Some  consider 
two-dimensional  but  unilateral  models.  Even  in  these  cases, 
except  for  [18] , no  attempts  have  been  made  to  infer  the  order 
of  dependency  or  dimension  of  neighborhood.  Though  two-dimen- 
sional neighbhorhood  models  have  been  considered  in  [19] [20] [21] , 
no  attempts  have  been  made  to  theoretically  justify  the  order  of 
the  models.  The  work  of  Pratt  and  Gagalowicz  is  motivated 
by  experimental  results  reported  earlier  by  Julesz.  The  models 
are  built  with  the  idea  of  matching  the  correlations  of  the 


I 


1.2  Inference  of  models 


We  believe  that  any  realistic  model  should  be  inferred  from 
the  data  by  using  statistical  inference  theory.  This  is  a 
powerful  approach  since  the  underlying  probability  distribution 
of  a texture  can  be  inferred  in  a consistent  way  by  using  the 
tools  of  system  identification. 

Not  much  work  has  been  done  in  the  area  of  image  modeling 
using  statistical  inference  of  random  fields.  This  involves 
assertions  about  the  probability  distribution  of  the  observed 
data.  This  is  usually  accomplished  by  considering  parametric 
forms  of  probability  distributions  with  a finite  number  of  para- 
meters which  is  reasonably  smaller  than  the  number  of  observations . 
Statistical  inference  is  then  concerned  with  choosing  among  the 
various  parametric  descriptions  of  the  underlying  data.  As  we 
are  interested  in  building  models  for  images,  our  basic  models 
will  include  bilateral  models.  We  will  also  include  some  uni- 
lateral models  to  check  if  the  bilateral  models  are  preferred  to 
unilateral  models,  for  images. 

For  the  reasons  given  in  Part  I,  we  will  do  the  analysis 
in  the  spectral  domain  and  take  a Bayesian  approach  to  fitting 


models . 


2.  Decision  rule 


The  case  of  a random  field  is  exactly  analogous  to  the 

one-dimensional  case  that  we  have  already  considered.  In  this 

section,  we  give  the  equation  for  the  random  field  and  state 

the  corresponding  stability  conditions.  We  state  theorems 

parallel  to  the  ones  in  Section  3 and  suggest  the  decision  rule 

that  chooses  the  correct  model  with  minimum  probability  of  error, 

T 

We  are  given  a set  of  observations  y(s) , sfcflg,  s=(s1,s2)  , 
from  &c,  a grid  of  dimension  N. x N„  and  l&s.^N. , i - 1,2,  from 
a stationary  random  field  and  r mutually  exclusive  compound 
hypotheses  C^,  l^i^r.  We  define  the  ith  parametric  form  of  the 


random  field  as  follows: 

E.:  E A (q)  y(s+q)  = u(s) 

1 qtQ  ~ 


(2.1) 


where  Q is  a finite  set  of  two-dimensional  vector  shifts  and 
u(s)  is  an  independent  and  identically  distributed  Gaussian 
random  field  with  mean  zero  and  variance  p.  (A(q) , q£Q)  and  p 
are  unknown. 

Let  4>T  = (A(q),q€Q)  and  eT=(4>T,p) 

Define  the  shift  operator, 


D^yts)  = y(s+q),  D1=(D1,D2) 


DS  = D^2 


such  that 


«3l  ^2 

Dx  D2  y (s)  = y (s1+q1,s2+q2) 


and 

H.  (D,<|>)  = E A.  (q)Dq 
1 ~ ~ qtQ  1 ~ ~ 

Then  (2.1)  can  be  represented  as 

Hi(D/4>)y(s)  = u(s)  (2.2) 

T 

Consider  the  Z transform  of  (2.2)  where  Z = (z-^z^  is  a 
complex  vector 

Hi(zf$)Y(z)  = U (z)  (2.3) 

We  make  the  following  assumption  (Al)  about  the  stability  of 
the  equation  (2.1). 

Al) : Hi(z,^)  ? 0 for  |z1l=lz2l=  1. 

Let  (x(X) , X£fix) , X=  (XlfX2)T, 

Xi  = 2Trki/N,  Os-ki<Ni,  i=l,2,N^N1N2 

denote  the  finite  Fourier  transforms  of  the  observations 

(y(s),  sC^s)  from  the  random  field.  As  discussed  in  Section  3, 

Part  1,  the  decision  rule  that  chooses  the  correct  model  with 

minimum  probability  of  error  is: 

* 

Decide  hypothesis  is  k if 

k = arg  max(P (C . | x (X) , X6fi, } (2.4) 

k 1 ~ ~ 

We  have 


p(ci|x(X),  xcfix)  = p (x (X) , xtnx|ci)p(ci) 


We  now  state  Theorem  1'  which  is  a generalization  of  Theorem  1. 
Theorem  1 ' : 


Let  the  observations  (y(s),  s£  fig)  obey  the  kth  equation 
E,  . Then  as  the  rectangle  fic  becomes  large  in  all  dimensions 
of  s,  the  finite  Fourier  transform  is  approximately  distributed 
normally  with  mean  zero  and  independently  at  different  frequencies 
with  variances 


T t t 

Syk(ei'  = pHk(ejX  'V  Hk*(ejX 

We  need  the  following  assumption  (A2 ' ) : 


for  all  Xfcfi^ 


A2 ' ) The  first  and  second  derivatives  of  1 £n||Hw(e~  ,<f>)  | | ai 

j , ~ x n. 

E | h'U)  | I /|  |H.  (e3-  ,4>)  | | w.r.t.  <t>  exist  for  all  4>£R 

xen,  ' ~ K 

We  state  Theorem  2’,  as  a generalization  of  Theorem  2. 
Theorem  2 1 : As  the  rectangle  fig  becomes  large  in  all  dimensions 


of  s , 


In  p (x  (X)  , Xfcfi^ ,0 |ck) 

= -(N/2)f (0k)-(N/2) ((p-pk)2dk  + (£-$k)TQk($-?k) 
+ (P-Pk)  ($-$k)Tsk  + o(|  |e-ej 13) 


~ ~k  1 


(2.7) 


where 

..T 

f(0k)  = 1 + -enpk+(l/N)  Z tn  | | H k (e  ~ ,$k) 
P k = | S .1 !XU> I |2/l |Hk(ej^  ffk) ||2 


(2.8) 


(2.9) 


Theorem  3 ' : 


As  the  rectangle  becomes  large  in  all  dimensions 


of  s, 

•enP(Ck|x  (X)  , X6«x) 

= -(N/2)fOJk)  + £np(^|Ck)  + i(nk+l)£n(2Tr/N) 

+ (N/2)-£n2ir  - ^n(det  Fn  (-g(0;N)))  f 

k ~ ~k 

+£nP(Ck)  - In  p (x(X) , Xefix)  (2.15) 

g(0;N)  = - [(p-pk)2dk  + ($-$k)TQk($-fk) 

+ (P-Pk) ($-1k)Tfkl 

The  proof  given  in  Appendix  II  can  be  easily  extended  to 
prove  this  theorem,  and  the  comments  following  Theorem  3 hold 
here  also. 

Now  we  give  the  approximate  decision  rule  that  chooses  the 
correct  model  with  minimum  probability  of  error: 

Decide  hypothesis  is  k*  if 

k*  = arg  max  (hk(x(X,,  X€^x)>  (2.17) 

k 

where 

hk(x(X)  , X«2X)  = -(N/2)f  (0k) 

-(nk/2)£nN  + £np(0k|Ck)  - |en(det  Fn  (-g(0;N))  (2.18) 

k 0=0. 

~ ~k 

For  practical  applications,  we  suggest  a simplified  decision  rule: 
Decide  hypothesis  is  k*  if 


k*  = arg  min{hk  (x(X) , Xfcfix} 
k 


(2.19) 


Discussion  and  comparisons 


The  topic  of  statistical  inference  on  random  fields,  which 
is  of  primary  interest  in  this  paper,  has  been  previously 
considered  by  Whittle  [4]  and  recently  by  Larimore [9] . Whittle 
has  developed  spectral  methods  for  stationary  autoregressive 
scalar  random  fields.  But  this  was  before  the  development  of 
algorithms  for  fast  Fourier  transforms  and  no  attempt  has  been 
made  to  prove  that  the  criterion  of  choosing  the  right  model  is 
consistent. 

Larimore  has  extended  Whittle's  method  for  the  case  of  vector 
random  fields.  But  Akaike's  criterion  has  been  used  for  choosing 
the  best  model.  As  observed  in  [5] , there  exists  no  proof  for 
the  optimality  of  Akaike's  rule  and  recently  it  has  been  proved 
that  Akaike's  rule  is  inconsistent  [26]. 

We  have  suggested  a consistent  decision  rule  that  chooses  a 
correct  model  with  minimum  probability  of  error.  The  theory  can 
be  extended  to  include  moving  average  terms  in  the  stochastic 
difference  equation.  This  modifies  the  conditions  for  stability 

( 1 

, and  the  numerical  computations  for  estimating  the  coefficients 

I 

become  more  complex. 

We  believe  that  this  approach  will  be  of  use  in  image  model- 
ing. So  far,  researchers  in  image  modeling  have  either  considered 


unilateral  models  or  have  not  used  system  identification  methods 
to  choose  the  correct  model.  The  inference  procedure  developed 
here  chooses  the  correct  model  with  minimum  probability  of  error 


for  unilateral  as  well  as  bilateral  models. 

This  approach  yields  an  explicit  expression  for  the  probab- 
ility density  of  transforms  of  observations  given  the  model  the 
observations  obey.  This  has  not  been  done  before  for  bilateral 
models.  The  expression  for  the  probability  density  of  transforms 
of  observations  could  be  used  for  classification  purposes.  This 
approach  should  result  in  good  classification  strategies  for 
textures . 

It  should  be  pointed  out  that  the  theory  developed  here  is 
based  on  the  assumption  that  the  random  field  is  Gaussian.  This 
assumption  has  often  been  used  in  the  literature  on  image 
modeling  [15] [17]  [18] . 


We  prove  Theorem  2. 

Consider  equation  (3.10)  repeated  below: 


Z.  np  (z(X^)  , z (Xj)  / •••  z ( ) 1 p i Ck ) 


We  first  compute  the  maximum  likelihood  estimates  of  <J>  and  p 
under  the  hypothesis  C^. 

Differentiating  (1)  w.r.t.  p and  equating  to  zero, 

Pk($)  = | ? | |z(Xi)  | |2/ 1 |Hk  (eDXi^)  | |2  (2) 

Substituting  (2)  in  (1),  the  maximum  likelihood  estimate  (m.l.e) 


<J>k  is  given  by 


= min  { jr  E tn  \ 
<p  i=l 


jXi 

(e  1,d>) 


-+fh[(i)_Ei||z(Xi)||2/||Kk(e' 


and 


4 . Extensions 

We  propose  to  use  the  models  developed  here  to  develop 
better  classification  rules  for  textures.  The  theory  developed 
here  can  be  extended  to  the  case  of  vector  random  fields.  This 
will  be  useful  to  build  models  for  observation  pairs  such  as: 

a)  (gray  level,  edge  value) 

b)  (gray  level,  average  gray  level  over  a neighborhood) 

We  will  treat  these  extensions  in  subsequent  papers. 


where 


, N jX.  _ 

f (<J.,p)  = lnp+  i Z &i|  |H.  (e  1 A)  \ \ 

Ni=l  K 

+ (1/Np)  E | |z  (X.)  | |2/|  1 H (e3  X± , <J>)  | |2  i 

i=l  1 K 

Expanding  f(<j>,p)  as  a Taylor  series  in  <|>  and  p at  <{>=<t>k  and 
p=Pk,  we  have 

f(^,p)  = f(<£k,pk)  + 9f  (ife  / p)  |*(p-Pk) 


+ *(^k}  + |(P-Pk)2  92f(^,P) 


i=*k 


*=*k 


+ i (<f>-  $,.)T  3?f(i,P) 

Cm  ~ ~J\.  r.  , 1 ‘ 


a*i3*j  <H 


* ($"^k)  + 


lti,jsn,  (7) 


(P-Pk)  (*-?k)T  iWo.  I _ +0(||Hk||3, 


3 p c*  4> 


i <t»=4>v 


By  definition  of  p,  and  <j>v. 


3f  (Sfe , P) 

op 


3f  (it  rP) 


«k 


are  zero. 


Appendix  II 

We  prove  Theorem  3.  We  first  state  a lemma  [10]  to  be  used 
in  the  proof  of  the  theorem. 

Lemma  1 ; Consider  the  integral 

G (N)  = /R h ( 9 ) exp  [g  (0  ,N)  ] d0  (10) 

where 

R is  an  n-dimensional  domain  in  the  Euclidean  n-space 
0 is  an  n-dimensional  vector 
N is  a large  positive  integer 

g(0,N)(a)is  a bounded  function  for  N large  and  assumes 
an  absolute  maximum  at  an  interior  point 
0(N)  = (0X(N) ,...,0n(N))T 

(b)  Fn (~g (6 ,N) ) >c>0  hold  in  R for  N large 

and 

Fn(-g(e'N))  =detll“9e  0 ( 0 r N]  I I lsi,jin 

i j ' 

Here  g^  Q (0,N)  is  the  second  order  partial  derivative  of  g(6,N) 

i j 

with  respect  to  0^  and  0^. 

Then 


G(N)  = (— ) n/2  [exp (g (0  (N)  ,N)  ]**h{§  (N)) 

N r n / / A \ % 1 l/2  f 


[Fn(-g(0,N))] 


0=0 (N) 


Proof  of  Theorem  3 : 


We  have,  to  perform  the  integration 


//p(z(A1)  , z(An)  |<J>,P,Ck)p(<J>,p|Ck)d<i>dp 


(12) 


Substituting  for  p (z  ( X1)  , . . . , z UN)|  P » Ck)  from  (9),  we  get 


LHS  of  (12) 


= (l/2iT)N//exp(-  Jf  Pk>  ) Jexp{^(- (P“Pk)Zdk“  ($”$k)  XQk  ($-$k) 


N#  / „ _ \ 2 , t a.  a 


- (P-Pk) ($_^k)TSk}p($/P |Ck)d$dp  (13) 

Identifying  the  terms  in  (13)  with  the  terms  defined  in  Lemma 


1,  we  get 


M/o  kt  _ 5Tt(nk+1)/2  - _ 

N/2exn(-Sf(d)  .0.  ))(=?■)  k d ( d).  • o. 


P(ZNlck)  = 


(l/2Tr)l,,/"e^(-Jf(^fpk))  (^-) 
[Fk(-g(G,N)) ]1/2 


p($k/pklck) 


0=0 (N) 


where 


g(0,N)  = — [ ( p— Pk) 2dk  + ($"$k)T9k ($“$k) 

+(p-Pk) ($-$k)Tfk] 


Appendix  III 

We  prove  the  consistency  of  the  decision  rule  given  in 


Theorem  3. 


We  have 

p2  (ZNI  ci)  = Problh^J  Cx] 

We  evaluate  Prob [h1>h2  Ic^]  for  the  two  cases  mentioned  in 
Section  4 . 


Case  (i) : 


Now, 

N jX  _ 

hi~h2  = N(fn  pL~£n  p2)  + E^nUH-^e  , ^ll 
N jX.  _ 

- E£n||H2(e  1 , | | ^ + (nj-n^  InU  + ^ 

i=l 


where 

C ^ = 2£  n [ p (4>  2,  p2  |C2)  /p(  ,p  ^ |C^ ) 1 
+ In  det  F^-gtf^N)) 


- In  det  F2  (-g(  0,N)  ) 
+ 2fn[P(C2)/P(C1)  ] 


9==6(N) 
0 =¥(N) 


Using  l nx  * x-1, 

N jX. 

hl"h2  * N(p1-p2)  + l £n  | | H-,^  (e  -L/$1 

jX.  _ 1 - 

- E tn  | |H2(e  ^ , <J>  2 ) \ \ + (n.-n^nN  +CX 

i=l 


(18) 


(19) 


[A 


jX.  _ N jX.  _ 

1 i \ I I ^ r 9..  I Iti  /« 


ln\  | H,  (e  x,*.  ) M - Z lri\  |H-(e 
1 i=1 


£nN  + 


1 N I I z (X . ) | | 2 

Recall  Pi  = m Z -ry-, k 

1 N i-1  H1(e]*i,51)||2 


(n2~n1) 


(19) 


From  Theorem  1,  z(X.)  is  a complex  normal  random  variable 

1 jAi  2 

with  mean  zero  and  variance  S . (e  , <J>,p)  and  hence  | |z(X^)  | | 

^ j X • 

is  distributed  as  an  exponential  variate  with  mean  S , (e  1 , <J> , p ) 

2 1 iAl  2 ~ 

Equivalently,  ||z(X^)||z  is  distributed  as  ^S^fe  ,<J>,p)x  (2), 

2 

where  X (2)  is  a chi-square  distribution  with  two  degrees  of 
freedom.  Also  from  Theorem  1,  z(X^)  is  independent  of  z(X^)  for 
i^j  and  hence  (19)  represents  a sum  or  weighted  independent 
chi-squared  variables.  The  standard  method  [27]  of  obtaining 
the  distribution  of  sum  of  independent  weighted  chi-squareds 
is  to  approximate  it  by  a multiple,  kX  (v) , of  a chi-squared 
variable  whose  mean  and  degrees  of  freedom  are  determined  by 
equating  first  and  second  order  moments.  For  our  purposes,  it 
suffices  to  mention  that  p^  is  a known  multiple  chi-squared 
random  variable. 


Let  = N(p1-p2)/p1  + 1 

(n2-n1) 


N 


j* 


N 


l ^nllH^e  i^1)ll2  - Z £n|  |H2(e:|Xi,$2)| 
i=l  ~ i=l 


'] 


variance  a 


+ ^^/(n2-n^')  be  a random  variable  with 
2 


finite  mean  and 


- 


Hence 


Prob  [h^>h2  |c^] 

s.  Prob[  + >/Nk  & n0]  = Prob[n,i  /Nk] 
2 


k2>0 


& 0(o2/Nk)  = 0(k2/N) , 
by  using  the  Chebychev  inequality. 


(27) 
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